Cooperative equilibria in the finite iterated prisoner's dilemma 
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Nash equilibria are defined using uncorrelated behavioural or mixed joint probability distributions 
effectively assuming that players of bounded rationality must discard information to locate equi- 
libria. We propose instead that rational players will use all the information available in correlated 
distributions to constrain payoff function topologies and gradients to generate novel "constrained" 
equilibria, each one a backwards induction pathway optimizing payoffs in the constrained space. 
In the finite iterated prisoner's dilemma, we locate constrained equilibria maximizing payoffs via 
cooperation additional to the unconstrained (Nash) equilibrium maximizing payoffs via defection. 
Our approach clarifies the usual assumptions hidden in backwards induction. 



I. INTRODUCTION 

Payoff maximization in the single stage Prisoner's 
Dilemma (PD) locates a unique Nash equilibrium point, 
mutual defection, which garners players a non-Pareto effi- 
cient outcome. Finite repetition of this single stage game 
defines the finite Iterated Prisoner's Dilemma (IPD) 
which is apparently solved by inductively propagating 
the single stage Nash equilibrium of mutual defection 
backwards through every stage of the game to estab- 
lish "All Defect" as the unique Nash equilibrium path. 
This single rational play strategy prevents players from 
cooperating to achieve higher payoffs. Nevertheless, in 
experimental tests (see 0, 0, tJ 0])) people often coop- 
erate to garner a greater payoff indicating either that 
modelling in game theory is somehow incomplete or that 
people behave irrationally. Many different proposals have 
been made along these two lines including suggestions to 
modify definitions of rationality and to bound rationality 
[M Oa , B , IM B , to take account of incomplete information 
|lOl llll Il2l lla ] and uncertainty in the number of repeat 
stages |l4j. to bound the complexity of implementable 
strategies [Tj, [lg, fl7l. to account for communication and 
coordination costs |18 |. to incorporate reputation and ex- 
perimentation effects Il9l or secondary utility functions 
as in benevolence theory |20| or in moral discussions [2l| , 
to include adaptive learning or fuzzy logic [2^|, or 
more directly, to employ com preh ensive constructions of 
normal form strategy tables |24 EH Et| . Interestingly, 
quantum correlations can be introduced to resolve the 
prisoner's dilemma |27|. 

As noted above, these proposals generally modify the 
definition of either the IPD or of player rationality to ex- 
plain deviations from the single unique Nash equilibrium 
pathway. By contrast, in this paper we assume Common 
Knowledge of Rationality (CKR) for all players and no 
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modification of the IPD game definition. Then wc gener- 
alize the analysis of the IPD to regimes where the fixed 
point theorems underlying existing equilibria do not ap- 
ply. In providing an existence proof for mixed strategy 
equilibria, Nash and Kuhn made the overly restrictive as- 
sumption that each player's mixed or behavioural strat- 
egy choice probability distributions were continuous and 
uncorrelated and so strictly independent [28l Ei^ | . This 
assumption allowed the analytic continuation of game 
payoff functions over a convex probability polytope en- 
abling the use of fixed point theorems to locate mixed 
strategy equilibria [2^. The subsequent widespread use 
of this existence proof as a definition of "Nash equilibria" 
requires rational players to locate equilibria by discarding 
information inherent in correlations, in effect, an assump- 
tion of "bounded" rationality. For multi-stage games, 
this same assumption that players of bounded rationality 
(employing myopic agents)3 must discard information by 
adopting uncorrelated play allowed Kuhn to define sub- 
game equilibria and subgame decompositions |29| . While 
this assumption is always valid for single stage games 
with an empty history set, in multistage games the joint 
strategy choice probability distributions of all players can 
be more generally correlated through being conditioned 
on game history sets, with these correlations invalidat- 
ing the a priori assumption of uncorrelated mixed or be- 
havioural strategies. Hence, we propose that players of 
unbounded rationality, in contrast to those of "bounded" 
rationality, will make use of all available information by 
exploiting correlated mixed or behavioural joint strategy 
choice probability distributions to optimize payoffs. 

In this paper, by considering broader classes of corre- 
lated joint probability distributions, we significantly gen- 
eralize the analysis of the IPD at the cost of a greater an- 
alytic complexity stemming for instance, from the non- 
applicability of fixed point theorems. It may be ques- 
tioned whether this cost is a price worth paying. How- 
ever, we have seen little in the literature considering 
regimes where these a priori assumptions are invalid, 
though these regimes might provide the mathematical 
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arena to model broader classes of experimental game be- 
haviours. Especially so given that well established tech- 
niques for manipulating conditioned and correlated joint 
probability distributions exist. 

Previous efforts to model correlated strategy choices 
have attempted this task discursively — see for instance 
the descriptive derivations of strategy function equilibria 
in differential games |30j and trigger function equilibria 
in supergames where players encourage adherence to a 
cooperation pathway usin g de viations to "trigger" cred- 
ible punishments [3lL l32t |3jJ, |3jj, |3J|. These previous 
treatments have strongly insisted that players prespecify 
an outcome for absolutely every possible (and impossi- 
ble) situation that might arise in a supergame — for jus- 
tifying quotations, see note As a result, current 
supergame analysis must either assume that all mixed 
or behavioural strategy choice variables are independent 
and optimized via a Nash procedure, or that all variables 
are fully specified by strategy functions which are them- 
selves independently selected and again optimized via a 
Nash procedure. 

This paper fills in the missing middle ground here, and 
considers correlated mixed or behavioural joint proba- 
bility distributions conditioned by sets of strategy func- 
tions which specify only some fraction of the possible 
variables (defining a dependent set) while leaving the re- 
maining variables unspecified (an independent set) to be 
optimized by standard Nash techniques. (Here, the de- 
pendent variables are described by correlated probabil- 
ity distributions conditioned on game history sets.) As 
such, we introduce no additional properties to the IPD 
and our use of strategy functions and Nash optimiza- 
tion techniques are entirely standard. Optimizing func- 
tions over a set of independent variables given a set of 
constrained dependent variables is the subject matter of 
constrained optimization theory as in La gran ge multi- 
pliers [2U E3 ) variational calculus 0, EMEU , and dy- 
namic programming |44| . Each adopted constraint set 
generalizes the a priori assumption made by Nash and 
Kuhn that each player's mixed or behavioural strategy 
choice probability distributions are uncorrelated and sep- 
arable. In fact, Nash and Kuhn's multistage analysis cor- 
responds to the special case where the constraint set is 
empty, and Nash equilibria are defined only for empty 
constraint sets. Hence, in our generalized analysis (as in 
any constrained optimization procedure), we must first 
take account of applied constraint sets prior to locating 
"constrained equilibria" in the constrained probability 
space. (We note that games with mixed strategies con- 
strained to lie within convex hyperpolyhedron by linear 
ine qual ities and equations have been considered in Rcfs. 

mm.) 

To avoid any confusion about our generalization, we 
first review the original Nash equilibrium definitions in 
the next section JIIJ, and then generalize these defini- 
tions to define constrained equilibria applicable to non- 
empty constraint sets in the section following In 
section llVl we derive constrained equilibria in the IPD, 



and observe the different player behaviours arising from 
adopted constraints. Further, we apply the Nash equi- 
libria definitions on the constrained equilibrium space to 
specify global equilibria. Finally we discuss the role of 
the backwards induction argument as it is applied to fi- 
nite supergames in section IV! 

II. REVIEW OF NASH EQUILIBRIA 

Throughout this paper, we consider supergames 
formed by finitely iterating a single stage game over 
1 < n < N stages where each stage is played between two 
non-communicating rational players denoted P x and P y . 
In the n th stage, players P x and P y choose their respec- 
tive stage strategies, denoted x n and y n , from the same 
strategy set S, that is x n , y n £ S, where S — {si, . . . , s s } 
and s is the total number of strategies, or alternatively, 
S = {s|s £ R, Si < s < s s } when strategy choice is con- 
tinuous. Game history sets H n known to both players 
in stage (n + 1) record occurring events with Hq = 0, 
and H n = {x\, . . . ,x n ,yi, . . . , y n }. The payoffs Tl x and 
Tly for players P x and P y are each specified as map- 
pings (or functions) from the set of all chosen strate gies 
Hn — {xi, . . . , xn, yi, ■ ■ ■ , J/w} to the real line |47l |48| 
via 

U z = U z (xi, . . . ,x N ,yi, . . . ,y N ), ze{x,y}. (1) 

We consider supergames where these mappings are de- 
fined as summations of the respective n th stage player 
payoffs ir x (x n ,y n ) > and l K y {x n ,y n ) > assumed non- 
negative without loss of generality, giving 

N 

Hz = ^27T z (x n ,y n ), ze{x,y}, (2) 

n=l 

nominally functions of 2N variables {x\, . . . , xn} and 
{j/i, • . . ,2/jv}. The goal of each player is to maximize 
their respective total payoff functions and 11^ . 

Strategy choices are defined as a functional mapping 
from the game history sets H n to the strategy set S, 
which specify after each history the specific strategy 
choices x n and y n to be selected [47], |48| . Thus, following 
standard definitions [47|: 

A pure strategy [x n ] of player [P x ] is a func- 
tion 

x n : n n -\ —> S = {si, s 2 , . . .}. (3) 

In Ref. 28], Nash defined mixed strategy equilibria 
in terms of expected value payoff functions. The most 
general possible expected payoff functions are 

s 3 

( n *} = X! P(x 1 ,...,x N ,y 1 ,...,y N ) 

xi...x N ,yi...y N =si 

xIL z (x 1 ,...,x N ,yi,...,y N ) (4) 



3 



for z £ {x,y}, and where the probability that player P x 
(P y ) chooses strategy choice x n S S (y„ € S) in stage n 
for 1 < n < N is P(x\, . . . , xjv, yx,.-., JJn)- Here, the av- 
erage is calculated over an ensemble of trials representing 
every possible circumstance and outcome. In particular, 
the total ensemble consists of an infinite number of sub- 
ensembles, one for every possible joint probability dis- 
tribution, each one of which contains an infinite number 
of trial outcomes. This is in accord with von Neumann 
and Morgenstern's definition of a strategy as "a complete 
plan: a plan which specifies what choices [a player] will 
make in every possible situation, for every possible ac- 
tual information which [that player] may possess at that 
moment" [49| . We emphasize that a player's complete 
strategy list is not synonymous with a mere listing of all 
the possible choices determining every pathway through 
the complete game tree despite this common usage in Ref. 
|49| — such a listing is missing information. A full listing 
of every possible situation which might occur will contain 
a successive listing of all the many possible joint strategy 
choice probability distributions which might be adopted 
by the players as well as subsidiary information about 
all the possible pathways through the associated game 
tree generated under those adopted distributions. If the 
information about the joint probability distributions is 
absent, this is equivalent to making a default assumption 
that all pathways are equally weighted (as conversely, 
differing joint probability distributions weight pathways 
differently). In actuality, discarding information about 
adopted joint probability distributions is equivalent to 
an assumption of bounded rationality, and the neces- 
sity of such restrictions has never been demonstrated. 
In particular, von Neumann and Morgenstern specifi- 
cally asserted that they were using a method of "indi- 
rect proof to imagine the form of a successful theory 
and to test the consequences for problems and contra- 
dictions |49j . Naturally, such contradictions were never 
found as restricting the solution space to a valid sub- 
ensemble ensures the absence of contradictions though 
at the expense of incomplete results. In this paper we 
do not make this unjustified assumption. As usual then, 
using P(A and B) = P(A)P(B\A) we have 

P(x!, . . . ,x N ,y!, . . . ,y N ) = Px x (xi)Pxy(yx) X 

xP{x2,...,y 2 ,...\Hx), (5) 



and the iterated identity 



P{Xn i • 



,x N ,y n , . 



,y N \H n -x) = Pnx{x n \H n -i) 



Pny(yn\H n -i)P(x n+ i : . . . ,y n +l, ■ ■ ■ \H n 



(6) 



successively applied for all n. Here, same stage choices 
x n and y n are independent events conditioned on the his- 
tory set H n _i and so potentially correlated. We also elect 
here to maintain the time ordering of conditioning events, 
though this is not necessary — probability distributions 
can be pre- or post-conditioned, so two events A and B 
occurring at two different times have joint probability 
P(A and B) = P(A)P(B\A) = P{B)P{A\B). So also, 



a player in their pregame analysis can condition events 
in one stage n on either earlier or later stage events as 
desired. The most general possible expected payoff func- 
tions are then 

(n z ) = J2 ^(#n-l)P(Zr*,yn|tfn-l)n z (xi,...) 

= l<x„,y„<a s 

H„-i 

= P ^(xx)Px V {yi) X ... X P N 

(%n\Hn-x) 



. . . 31 jy 

U1---VN 



xP Ny (y N \H N _i)n z (xi, . . .,x N ,yx, ■ 



,VN) 



N 



P ^(xx)Pxy(Vx) X .. 



n—1 xi-.-Xn 



X Pnx ( x n\Hn—l)Pny {yn \H n — l)7T z {x n , y n 

), (7) 

for z £ {x,y}. Each of the conditioned distributions 
Pnz(z n \Hn-i) can be written as a list of potentially cor- 
related behavioural strategy distributions 



Pnz 



Pnz,H' n _ 1 ( z n), if Hn-1 — H'n-1 
Pnz,H'^_ 1 ( Z n)j ^ Hn-l = H'n-1 



(8) 

with up to 2 2 (" x ) entries, one for each possible history 
set H n -\. The individual distributions P n x,H n -i ( z n) an d 
Pn , z',H n ,_ 1 {z' n ,) can still be correlated as when, for in- 
stance, a single "dice" is used to determine both out- 
comes. (For completeness, note |50j lists the definitions 
of correlated variables in terms of their covariance, vari- 
ance and means.) Such correlations further imply that 
these distributions are not necessarily continuous. These 
potentially correlated behavioural strategy distributions 
allow writing the most general expected payoff functions 
as 

(n z ) = Y P lx{xx)Pxy(yx) X ... X Pnx.Hm^Axn) 

xi . . . x N 
V1---VN 

xP Nv M N -AyN)^z{xi, ...,x N ,yx,.. . ,yjv)-(9) 



Here, all possible contingent histories have been taken 
into account weighted by their respective conditioned 
probabilities. Players can now seek to optimize their pay- 
offs by applying any relevant optimization technique to 
these general expected payoff functions. Of course, if 
the functions are correlated and discontinuous then fixed 
point theorems cannot be used to locate optima, and 
also, if the variables are correlated it is absolutely nec- 
essary to resolve the correlations as imposed constraints 
prior to applying any optimization procedure. That is, 
if correlations exist, any optimization procedure such as 
backwards induction must take those correlations into 
account as imposed constraints before seeking to derive 
an optimal pathway. 
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Given these most general expected value payoff func- 
tions, we now revisit the definition of Nash equilibria 
as developed by Nash and as explicated by Kuhn 
|29|. When introducing behavioral strategies Kuhn "ex- 
plicitly assumed that the choices of alternatives at dif- 
ferent history sets are made independently. Thus it 
might be reasonable to call them 'uncorrected' or 'lo- 
cally randomized' strategies." [29J. Such uncorrelated 
behavioral strategies capture the myopic viewpoint of 
non-communicating agents possessing "a local perspec- 
tive [which] decentralizes the strategy decision of player 
i into a number of local decisions." In this, the 

agent-normal game form, myopic agents at each history 
set determine paths through the game tree using prob- 
ability distributions which are uncorrelated and inde- 
pendent. This assumption allowed Kuhn to prove the 
equivalence of uncorrelated behavioural strategies and 
the uncorrelated mixed strategies introduced by Nash in 
games of perfect recall 29] . This equivalence was es- 
tablished by recognizing that player P x (P y ) could in- 
dex all their possible pure strategies by parameter a (/?) 
with the probability of playing that strategy being P x (a) 
(Py{P)) given by an appropriate product of the uncorre- 
lated behavioural stategies P n;E ,H„_i {x n ) {P ny ,H n -i {Vn))- 
This then allowed writing the non-general expected pay- 
off functions as 



(n z ) = ^P K (a)P y (/3)n z (a,/3) 



(10) 



a0 



for z S {x, y} where here, the summation is over an ap- 
propriately limited set of a and values. Nash consid- 
ered the mixed strategies P x (a) and P v ((3) as "a collec- 
tion of non-negative numbers which have unit sum and 
are in one to one correspondence with his pure strate- 
gies." so the expected payoff functions were linear in 
the mixed strategies for each player allowing optimiza- 
tion over a "convex subset of a real vector space" via 
fixed point theorems [2^]. This definition follows that 
of von Neumann and Morgenstern in establishing a 
one to one correspondence between a player's pure and 
mixed strategies which are subject to appropriate nor- 
malization constraints but to no other constraints such 
as might result from using fully general joint strategy 
choice probability distributions. Of course, these restric- 
tive assumptions limit the ensemble over which payoff 
averages arc calculated, and in the full ensemble corre- 
lated behavioural strategies can break the one-to-one cor- 
respondence between the mixed strategies and the full set 
of uncorrelated pure strategies, can render expected pay- 
off functions discontinuous, and can invalidate the use of 
fixed point theorems as an optimization technique. 

The assumption that players employ uncorrelated 
mixed or behavioural strategy choices then allows the 
definition of Nash equilibria in terms of the probabilities 
jj_ = {P x {a),Ma\ and q = {P y (/3), V/3}. Following Nash 

If and only if each player's mixed or be- 
havioural strategy choices are uncorrelated 



and independent, then a 2-tuple (p*,q*) of 
unconditioned probability distributions forms 
a mixed strategy equilibrium point if and only 
if for all players, 



<iw,n> 



naax[(n x (p, q*))] 

Vp 



0W,r)) = max[0W,g))]. (11) 

Vg 

Thus an equilibrium point is a 2-tuple (p* , q*) 
such that each player's pure strategy maxi- 
mizes their payoff if the strategies of the oth- 
ers are held fixed. Thus each player's strategy 
is optimal against those of the others [2^] . 

We note here that Nash proved an existence theorem 
in Ref. |2Sj and not a uniqueness theorem, and it is 
well known that when the restrictive assumptions are not 
made then mixed strategy equilibria do not necessarily 

exist H3, IH, E3, US El, IHE Eg] 

Pure strategy Nash equilibria can be defined by further 
specializing the strategy choice probability distributions 
to be either zero or one, P x (a), P y (f3) G {0,1} for all 
values of a and (3, so one or another pure strategy is in- 
dependently chosen by each player with certainty. Then 
pure strategy Nash equilibria can be defined following 
Nash 0: 

If and only if each player's pure strategy 
choices are uncorrelated and independent, 
then a 2-tuple (a* , /?*) is a pure strategy equi- 
librium point if and only if for all players, 



max[II x (a,/r)] 

Vq 



H y {a*,/3*) = max[n y (a*,/3)]. 



(12) 



Another situation where the Nash definition can be 
applied is when all variables are fully specified by strat- 
egy functions which are themselves independently se- 
lected. In this case, strategy choices are conditioned 
on earlier events and so possibly correlated in any 
stag e despite be ing chosen independently by each player 
[301I3TL l32l I33L l34| . However strategy functions intro- 
duced in these approaches are very limited in the sense 
that any implemented strategy function set must fully 
specify an outcome for every stage and every possible 
situation which might arise in a game. (For justifying 
quotations see [3^.1 Because players P x and P y inde- 
pendently choose sets of N strategy functions denoted 
•Ax = {x n : H n _i — ► S, 1 < n < N} and similarly for A y 
to generate payoffs IL Z (A X , A y ) for z — {x,y}, it is pos- 
sible to define strategy function Nash equilibria [2§ll30| : 

If and only if all 2N strategy choice vari- 
ables are functionally specified, a 2-tuple 
strategy profile <f> — {A X ,A*} is a strategy 
function Nash equilibria if and only if for all 
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players, 

U x (At,A* y ) = max[n x (^,^;)] 
n y {Al,Al) = m^[Il y (A x ,A y )}. (13) 

Essentiall y eq uivalent definitions underly trigger function 
equilibria H3 SHI. 

Based on the equilibria definitions above, supergame 
analysis must either assume that all variables are inde- 
pendent and optimized via a Nash procedure, or that 
all variables are fully specified by strategy functions. To 
treat more general strategy functions, it is necessary to 
extend the Nash equilibrium concept to regimes where 
fixed point theorems are inapplicable. The next section 
does this by defining new constrained equilibria. 

III. CONSTRAINED EQUILIBRIA 

In this section we define constrained equilibria through 
allowing the use of correlated multivariate probability 
distributions to optimize expected payoffs. 

We note firstly that neither Nash nor Kuhn provided 
any rationale requiring rational players to restrict the 
size of the ensemble used to calculate payoff averages by 
only employing uncorrelated mixed or behavioural prob- 
ability distributions. Likely, this assumption was made 
in the context of an existence proof to simplify analysis 
as the fully general treatment of correlated multivariate 
probability distributions is difficult. However, adopting 
correlated distributions can often greatly simplify anal- 
ysis. Consider that when events A and B are perfectly 
correlated, the joint probability of both events reduces 
to P{A and B) = P(A)P{B\A) = P(A), so one vari- 
able entirely disappears reducing the dimensionality of 
the problem space and simplifying the problem. (This 
dimensionality reduction is a normal result whenever con- 
straints are applied in optimization problems.) 

In fact, any assumption that players must adopt the 
restricted ensemble available under uncorrelated play is 
equivalent to a claim that players must discard infor- 
mation and so amounts to an assumption of bounded 
rationality. (For completeness, note [53 details the mu- 
tual information content of correlated joint probability 
distributions.) How might rational players exploit the 
information in correlated distributions? A minimum nec- 
essary condition for the existence of an equilibrium point 
is that opponents must be unable to improve their pay- 
offs by altering their strategy choices, so essentially, op- 
ponent's payoff function "gradients" must be negative 
at equilibrium points. Now, rational players can exploit 
correlations to constrain the payoff function space topol- 
ogy so as to alter the possible directions in which payoff 
functions can change. As "gradients" can only be taken 
along allowed directions, such constraints alter the pay- 
off function "gradients" at any point. Thus, it is possible 
for rational players to choose correlations to alter payoff 



function "gradients" to generate novel equilibria at novel 
points. In this paper, we assume that rational players 
will make use of all available information including that 
implicit in correlated joint probability distributions. 

In this paper, constrained equilibria are defined for 
pairs of sets of strategy functions {{A„}, {^k}}. For the 
pure strategy case, these strategy functions Z n may be 
represented as 

•^n — X n (xi , . . . , x n —\ , yi , . . . , yn—i ) 

y n = Y n (x 1> ...,x n -i,yi,...,y n - 1 ). (14) 

Here x n and y n might be independent variables 
or dependent on some or all of the variables 
afi, . . . ,x n -i, j/i, . . . , y n -i- For the pure strategy case, 
neither these variables nor the strategy functions are 
probabilistic. In terms of conditioned probability dis- 
tributions, these constraints take the form 

Pnw\ x n\H n -l) = ^x n ,X n (i?„_i) 
P n y{y n \H n -l) = 6y ntYn (H n -i), (I 5 ) 

where 8 a ^ is one if a = b and zero otherwise. (Of 
course, more general distributions could be considered.) 
These functional notations, though widely used to rep- 
resent payoff functions, have not been widely employed 
for strategy functions. We note that it has been 
used in the derivation of best reply (strategy) functions 
0|3£|, Stackelberg duopolies [33, a two-stage prisoner's 
dilemma |58| . correlation in randomized strategies |56j| . 
and differential games |3Clj| . while a number of texts de- 
scribe strategies as "functions" without actually intro- 
ducing a function notation 

Suppose player P x (or P y ) chooses a set of strat- 
egy functions, which we might conveniently call their 
algorithm denoted as A x = {X m , . . . , Xj~} (or A y — 
{Y n , . . . , Yj}), to constrain some (or none) of their strat- 
egy choice variables creating a set of dependent variables 
and a remaining set of independent variables. For nota- 
tional convenience, we relabel the independent variable 
sets for P x and P y as respectively a — {u\, . . . , u x } 6 S xx 
and j3 = {v 1: . . . , v y } G S x v where < x + y < 2N. 
We assume here that at least one variable remains in- 
dependent as otherwise, the optimization becomes triv- 
ial. Immediately then, the payoff functions for the play- 
ers become composite with reduced dimensionality (and 
changed properties) given by 

n z ^n z (aj), z£{x,y}. (16) 

Here, no dependent variables, such as x-i = X<i(x\,y\) 
say, appear in the composite payoff functions. 

With the representation of strategy functions in terms 
of the independent variables, the Nash equilibrium defini- 
tion can be applied to the space of independent variables 
to define pure strategy constrained equilibria via: 

Given a particular constraint set A x and A y , 
a 2-tuple (a*,/3*) is a pure strategy con- 
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strained equilibria if and only if for all players, 



Il y (a*,j3*) = max[n a (a*,/3)]. 



(17) 



We now generalize this definition to mixed strategy 
constrained equilibria. In this case, strategies at any 
stage can be probabilistic, so the strategy functions of 
Eqs. I|14(l and Ijl5|l applied at stage n map each history 
set H n -\ to a probability distribution. The set of these 
probabilistic strategy functions P nz {z n \H n -.\) forms an 
algorithm A z . For a pair of algorithms {A X ,A V }, the 
expected total payoffs given as Eq. (QJ can be rewritten 
in terms of independent probability distributions {p, q} 
only, where p — {p\, . . .} specifies the probability that 
any allowed value of a is implemented, while q — {q±, . . .} 
specifies the probability that any allowed value of (3 is im- 
plemented. Here we used the same relabelling as in the 
pure strategy case above. Then, we can define mixed 
strategy constrained equilibria as: 

Given a particular constraint set A x and A v , 
a 2-tuple {p*,q*) is a mixed strategy con- 
strained equilibria if and only if for all players, 



max[(U x (p, q*)}} 

Vp 



<n„(p*,r)> = max[<n y (r,g)}]. (18) 

Vg 



As is usual in optimization theory, every alternate 
strategy function set {A x , A y } imposes constraints on ei- 
ther the space of possible pure strategy choice variables 
or the space of possible mixed strategy probability dis- 
tributions. Geometrically, these constraints take a cross- 
section onto some subspace wherein all constraints are 
satisfied, and in which the composite payoff functions 
exhibit changed continuity properties and altered max- 
ima. The composite payoff functions of reduced dimen- 
sionality define pruned extensive form game trees involv- 
ing only independent variables — variational optimization 
techniques can only ever be applied to independent vari- 
ables. Consequently, subgame decompositions, and Nash 
equilibria can be applied to extensive form game trees 
only after these have been pruned of dependent variables. 
As is usual in constrained optimization problems, dif- 
ferent constraint sets generate novel trees defining novel 
equilibria. In the next section we demonstrate how these 
new equilibria emerge in the IPD analysis. A question 
which might arise here is how to determine the best equi- 
librium among these new equilibria. As is well known in 
game theory, in general, there is no simple way to choose 
between many alternate Nash equilibria |2(| |33, |3a |5!j . 
However, through the example of an IPD in the next 
section, we will show a way to address this issue using 
standard Nash techniques. 



IV. CONSTRAINED EQUILIBRIA IN THE 
FINITE ITERATED PRISONER'S DILEMMA 



In this section we determine constrained equilibria in 
the finite IPD and demonstrate that cooperation can 
naturally emerge as a consequence of constraints. In 
this supergame, each player has two possible single stage 
strategy choices S = {C, D} for Cooperate and De- 
fect respectively with stage payoffs ir x (x n ,y n ) > and 
TT y (x n , y n ) > determined by the payoff matrix 





Py 






C 


D 




(2,2) 


(0,3) 


D 


(3,0) 


(1,1)- 



(19) 



This payoff matrix defines single stage payoff functions 



T^y (^n j Un) = 2 2a? n -\- 7/ n , 



(20) 



where z n represents the strategy choice for player P z such 
that represents cooperation and 1 represents defection. 
Total game payoffs of a finite IPD of the length N are 
then 



A' 



U x (xx...x N ,yi...y N ) = y^(2 + x n - 2y n ) 

71=1 

N 

U y (xi...x N ,yi...y N ) = ^(2-2x„+y„). (21) 



Following Eq. the expected payoff functions for players 



P x and P y are then 



JV 



{U x } = + J2 p iM p iy(yi) 



x . . . 



n=l ■■ 



Vl---Vn 

*>Pnx,H n -i(<Xn)Pny,H, l -i(.yri){Xn ~ 2y n ) , 
N 

(U v ) = 2N + Y, J2 p iM p iv{yi) x ••• ( 22 ) 

71—1 Xl-.-Xn 

yi---y n 

H n „ 1 {x n )Pny,H n - 1 {yn){-2x n + y n ). 

We now derive the unconstrained Nash equilibrium for 
the IPD after applying the assumption of bounded ra- 
tionality so all probability distributions are uncorrelated. 
We first note that the total rate of change of the expected 
payoff function with respect to the changing probability 
distributions is 



d(Tl z ) 



d(U z ) 



d [P^Hn-t (1)] d [PnzM,,-! (!)] d [P nz , Hn _, (0)] 

(23) 

due to the normalization constraint P n z,H n -i (0) 
1 — P„ Z) H n _ 1 (1). The shorthand notation H n = 
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{H n -i,x n ,y n } and some algebra allows writing the op- 
timization conditions for player P x as the set of simulta- 
neous equations 

d(IL x ) 



d[Pi x (l)] 



d{Il x ) 



= 1 



d[P(N-l)x,H N - 2 {^)] 

Plx{xi)...P {N ^ 2 )yM N - 3 {yN-2) 



x l- x N-2 
VX—VN-2 



P(N-l)y,H N - 2 (yN-l) Y ( x N~2y N ) X 
Kjv-i x N y N 

[ P Nx,{HN-2,hyN-i}( XN ) P Ny,{HN-2,l,yN-i} 
PNx,{H N - 2 ,0,yN-i}( x N)PNy,{H N - 2 ,0,y N - 1 }(yN) } , 

W = ! (24 ) 

The equivalent simultaneous optimization conditions for 
player P y are 

d{Uy) = 

d[Ply(l)} 



d(n y ) 



d[P(N-l)y,HN-2( 1 )] 

Y Plx(xi) . . .P(N-2)y,H N - 3 {VN-2) 



x 1 ...x JV _ 2 

V1---VN-2 



Y P (N-l)x,H N _ 2 { x N-l) Y (yN~2x N ) X 
xn-i x N y N 

[ PNx,{H N _ 2 ,x N _ 1}1 }{ x N)PNy,{H N _ 2 ,x N _ 1 ,l}(yN) 
PNx,{H N _ 2 ,x N _ 1 ,0}{ x N)PNy,{H N _ 2 ,x N - 1 ,0}{yN) } 
d(U y ) 



d[PNy,H N -iO-)} 



(25) 



Subsequently each player, denoted P z , solves their re- 
spective sets of simultaneous equations to maximize their 
payoff by setting Pnz^n-i = 1 f° r au history sets 
i/jv-i, and by setting P(n-i)z,h n - 2 = 1 f° r au " hi s ~ 
tory sets Hn-2, and so on. The final result is that 
both players defect at every stage giving the optima as 
(x n ,y n ) = (1, 1) = (D, D) for all n. At this point, payoffs 
are «n a >, (n„» = (N,N). 

Now, we generate constrained equilibria using corre- 
lated distributions in the most general expected payoff 
functions of Eq. Q. As a first step, we consider player 
P x adopts Markovian-like (MKV) strategy functions de- 
pendent only on the results of the previous stage via 



X„(y n -i) = y n -i, 



(26) 



for 2 < n < N. We assume P y adopts an empty 
constraint set so all P n y,H„- 1 (yn) distributions are in- 
dependent. The imposed constraints are equivalent to 



the correlated probability distributions P nx (x n \H n -i) — 
&E n ,j/„_i) so the most general expected payoff functions 
become 

<n,> = p iM p iy(yi) p 2y(y2\Hi)x... 

xi,vi,—,vn 

xP Ny (y N \H N ^i)Il z (xi,yi, . . . ,y N ) (27) 
for z e {x, y}, with generated payoffs 

JV-l 

Hx(xi,Vi,-->,Vn) = 2A^ + Xi - Y Vn - tyN 

71=1 
N-l 

U y (xi,yi,...,y N ) = 2N - 2x x - ^ Vn + 2/jv-(28) 



n=l 



The expected payoff functions for players P x and P y are 
then 



(IL X ) = 2N + J2 



N-l 



n )y: 

n=l yi...y n 

-2 Y p i v {yi)--- p Ny,H N . 1 (yN)y 



N, 



{By) = 2N-2j2 p ix(x 1 )x 1 



(29) 



N-l 



P ly(yi) ■ ■ ■ P ny,H n -i {Vn)yr, 

n=l yi...yn 

Y p iy{yi)--- p N V ,H N - 1 {yN)yN- 



The generated constrained equilibria are now calculated 
by applying the assumption of bounded rationality so all 
remaining distributions are uncorrelated. Immediately 
then, player P x optimizes their expected payoff via sat- 
isfying the condition 



d(U x ) 
d[P lx (l)} 



= 1. 



(30) 



Consequently, player P x optimizes their first and final 
stage payoff by setting Pi x (l) = f and so defects in this 
first stage. The shorthand notation H n = {H n -i,y n } 
for n > 1 and some algebra allows writing the optimiza- 
tion conditions for player P y as the set of simultaneous 
equations 

d(n y ) 



d[Ply(l)] 



d(n y ) 



= -l 



d[P(N-l)y,H N -2 (1)] 

+ P ly(yi)--- P (N-2)y,H N . 3 (yN-2) 



y\...yN-2 
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[PNy,{H N _ 2 ,l}(yN) - PNy,{H N _ 2 ,0}(yN)] , 

VN 

d(Uy) 



d[-PAri/,j?jv_i(l)] 



= 1. 



(31) 



Hence, player P y optimizes their payoff by setting 
PN y ,H N -i{l) = 1 f° r every history set iJjv-i, and by 
setting P( N -i)y,H N - 2 (l) = for every history set H N _ 2 , 
and eventually by setting P n y,H n _-y (1) = for 1 < n < 
(N — 1). These conditions locate the constrained equi- 
libria at the point (xi, yi, . . . , ijn) = (1, 0, . . . , 0, 1) gen- 
erating the play sequence 



%n l/n — 1 



(33) 



for 2 < n < N, equivalent to the conditioned 
probability distributions P nx (x n \H n ^i) = 5 Xn , Vn _ 1 and 
Pny(yn\H n -i) = fiy„,x n -i- These constraint sets project 
the most general expected payoff functions to 

(IL Z ) = Pix(xi)Pi v (Vi)n,(*i>Vi) (34) 

xi,yi 

for z 6 {x,y}, where for a given play sequence (x\,yi), 
the payoffs are 



2N- fzi - f yi, iVeven 

27V - ^x l - ^yi, JV odd 

27V- f xi - f i/i, iVeven 

2iV - - /V odd. 



(35) 



The N stage supergame has now been exactly reduced to 
a single stage game with variables x\ and y\ with payoff 
matrices, for N even of 



P-r 



and for odd N of 





Py 




(Hx.IIy) 


C 


D 


c 


(2N, 2N) 


(In, In) 




(IN, IN) 


(N,N), 



(36) 





Py 




(n„n,) 


C 


D 


c 


(2N, 2N) 




D 


§[JV+1,JV-1] 


(N, N). 



(37) 



Here, the constraining strategy functions have changed 
the off-diagonal elements of the effective payoff matrix to 
modify equilibria. As usual, the generated constrained 
equilibria are now calculated by applying the assumption 
of bounded rationality so all remaining distributions are 
uncorrelated. The expected payoff functions are then 



<n*> 



2JV-fP lx (l)-fP lv (l), iVeven 



2N - ^P lx (l) - ^Pi w (l), N odd 



(x n ,y n ) = (D,C)(C,C)...(C,C)(C,D) (32) (n,) 

to give expected payoffs ((Il x ) , (U y )) = (2N-1, 2N-1). 
Here, player P x defects in the first stage as their opponent 
cannot respond without decreasing their payoff, while P y 
can defect in the last stage when P x can no longer re- 
spond. 

Next we treat another combination of constraints as- 
suming the Markovian strategy algorithms for both play- 
ers as 



N even 

(38) 



2iV-fP lx (l)-fP l2/ (l), 

_ 2N - ^P lx (l) - *j*P lv (l), N odd. 
As usual, the constrained equilibria are located via 

A" even 



d(n x ) 

0[Plx(l)] 

d(Tly) 

d[Ply(l)} 



N_ 

' 2 ' 



■±(N-3), N odd 



N 
' 2 ' 



TV even 



(39) 



-i(JV-3), N odd. 



These conditions select the equilibrium points Pi x (l) = 
and P ly (l) = or (xi,yi) = (0,0) = (C,C) for ci- 
ther N even or for A^ odd and greater than 3, while for 
N = 1 the equilibria is Pi x (l) = 1 and P\ y (l) = 1 or 
(si, yi ) = (1,1) = (D,D). When N = 3 these con- 
ditions are satisfied for any values of (x\,yi) requiring 
examination of actual payoffs motivating the selection 
(xi,y{) — (0,0) = (C,C). The generated sequences of 
play are 



N 


(xi,yi) 




«n,),<n,» 


1 


(1,1) 


(DD) 




(i.i) 


N>2 


(0,0) 


(CC). 


.(CC) 


(2N, 2N). 



(40) 



The shear number of possible strategy functions which 
might be adopted make it necessary to consider more gen- 
eral functional classes. To this end, we consider that each 
player adopts a mixed Markovian-Independent strategy, 
denoted MKV-fcl, where the MKV strategy is chosen 
from the first to (N — fc)-th stages while fcl indicates that 
IND strategies are adopted for the last k stages. For 
player P x then, an MKV-fcl strategy sets 



Xn — < 



Vn-1 



2<n<N-k 



(41) 



x\, Xn-U+1, ■ ■ ■ , xn independent. 



Similar strategy functions denoted MKV-j'I are imple- 
mented by P y . These constraints are equivalent to 
the correlated probability distributions P nx (x n \H n _i) = 
&x n ,y n -i for 2 < n < N-k and P ny (y n \H n _ 1 ) = 6 VniXn _ 1 
for 2 < n < N -j. 
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«n«),<n»» 


3 = 


1 


2 


3 


4 


TV-2 


TV- 1 


k = 


TV, TV 


TV" 2 ,TV +1 




= 


= 


TV" 2 TV +1 /~ 2 


TV" 1 , TV" 1 


I 


TV +1 , TV -2 


TV -1 , TV" 1 


TV -3 , TV 






■■ TV" 3 ,TV +0/ " 3 


TV -2 , TV -2 


2 




TV, TV" 3 


TV" 2 , TV" 2 


TV" 4 , TV" 1 


= 


.. TV" 4 ^" 1 /" 4 


TV" 3 , TV -3 


3 






TV" 1 , TV" 4 


TV" 3 , TV" 3 


TV -5 , TV -2 


■■ TV" 5 ,TV" 2/ ~ 5 


TV -4 , TV" 4 


4 








TV" 2 , TV" 5 


TV 4 ,TV 4 


■■ TV" 6 ,TV~ 3/ ~ 6 


TV" 5 , TV" 5 


TV-4 










TV" 3 , TV" 6 


TV 7V+ t> /+- i 


TV +3 ,TV+ 3 


7V-3 


TV +1/ - 2 ,TV- 2 


TV+°/- 3 ,TV- 3 


TV" 1 /" 4 , TV" 4 


TV" 2/ " 5 ,TV" 5 


TV- 3/ - 6 ,TV" 6 ■ 


■ ■ tv +1 'tv+ 4 / +1 


TV +2 ,TV+ 2 


TV — 2 
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TV +2 ,TV+ 2 


TV +1 ,TV +1 


TV - 1 


TV -1 , TV" 1 


TV -2 , TV" 2 


TV" 3 , TV" 3 


TV" 4 , TV" 4 


TV -5 , TV -5 


.. TV +1 ,TV +1 


TV, TV. 



TABLE I: A partial listing of constrained equilibria for the IPD when player P x implements a Markovian strategy algorithm 
MKV-fcl, while player P y implements an MKV-jl algorithm. Here, every shown payoff pair is a constrained equilibrium point 
making selection of a single best payoff maximization strategy difficult. For brevity, we restrict consideration to large TV > 8 
say, and write TV ±fe = (TV ± fc) and TV ±fe = (2TV ± k). Fractional indices (+k/ — j) indicate alternate equilibria with payoff 
increments of +k and —j respectively. Ditto signs (") and equal signs (=) copy values downwards and to the right respectively. 



The general MKV-fcl strategy subsumes a number of 
other strategy functions of interest. For instance, setting 
fc = TV — 1 or fc = TV makes all variables independent 
(IND), so MKV-(TV - l)I=MKV-iVI=IND. More inter- 
estingly, this strategies subsumes certain deterministic 
strategies. To see this, suppose that players consider a 
deterministic strategy choice D in the last stage, and so 
implement the strategy MKV-1D. More generally, players 
may also consider strategies MKV-fcD forcing the choice 
D in the last k stages. However, it is not difficult to see 
that this class of deterministic strategies is weakly dom- 
inated by the class of MKV-fcl. For the same k, MKV-fcl 
guarantees an equal or larger payoff than achievable us- 
ing MKV-fcD against any strategy algorithm of the oppo- 
nent. In particular, the motivation to defect at the last 
stage for a larger payoff is taken into account in the strat- 
egy class MKV-fcl. Exactly similar considerations estab- 
lish that MKV-fcl strategies weakly dominate Tit-For-Tat 
strategies which specify cooperation in the first stage. 

Although the class of strategies MKV-fcl is small in 
comparison to the set of all possible strategies, it con- 
tains enough complexity to demonstrate novel equilibria 
in the IPD. We consider player P x to implement strategy 
function MKV-fcl, while player P y implements MKV-jl, 
so the most general expected payoff functions become 

<n z ) = Pix{xi)Pi y {yi) x ... 

yi,VN-j+i;---VN 

••• x Pn v ,h n -Avn)^-z (42) 

for z £ {x, y}, where the payoffs for a given play sequence 
(xi,XN-k+i, ■ ■ ■ ,XN,yi,yN-j+i ■ ■ ■ ,Vn) are 

JV N 
n— 1 n— 1 

with variables A zn and B zn as specified in Appendix lAl 



The assumption of bounded rationality and that all re- 
maining distributions are uncorrelated now allows calcu- 
lating the respective constrained equilibria with the op- 
timized payoffs as shown in Table [I] for all combinations 
of fc and j . 

Every listed payoff pair in Table ^ is an actual con- 
strained equilibrium point optimizing payoffs given im- 
posed constraints. As noted previously, there is no gener- 
ally accepted method to choose between alternate Nash 
equilibria. However, strategy algorithms are indepen- 
dently selectable by each player, so we can think that 
these strategy algorithms and constrained equilibria cre- 
ate a new game defined by Table Q] In this game on the 
constrained equilibrium space, each strategy algorithm 
becomes a strategy choice, and each equilibrium point 
becomes a pair of payoffs. In the case where each pair 
of strategy algorithms defines unique equilibrium pay- 
offs, it is obvious that this table can be considered as a 
game matrix. Hence standard Nash techniques can be 
applied to determine global equilibria among the located 
constrained equilibria. However, we note that in general 
we have to take care to deal with multiple equilibria gen- 
erated by a pair of strategy algorithms. By applying the 
Nash equilibrium definition to Table |U we obtain global 
equilibria at (fc, j) for either fc = and 3 < j < (TV — 2), 
or j = and 3 < fc < (V - 2). 

These global equilibria can be considered rational for 
the IPD in this restricted class of strategies, and there is 
no established way to select a particular one among these. 
The more important feature given from this analysis is 
that cooperation naturally arises from these equilibria. 
The pathways produced by these equilibria are domi- 
nated by cooperation apart from some different choices 
at the last stage. This cooperative behaviour is caused 
by the imposition of strategy function constraints. 
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V. CORRELATED PLAY AND THE LAST 
STAGE OF THE IPD 

In the previous sections, we have emphasized the im- 
portance of correlated play in supergame analysis and 
the assumption of independent mixed or behavioural 
strategy choice variables required by the Nash equi- 
librium definition. Although experiments on the IPD 
(and other games) have shown significantly different be- 
haviours from that predicted by the Nash analysis, there 
has been little motivation to revise this assumption and 
extend the Nash definition of equilibrium to a fully cor- 
related analysis. Many alternate approaches have been 
proposed, and the field has developed in the direction 
of explaining why people do not behave as rationally as 
they could. In the IPD this explanatory emphasis re- 
sulted largely because of the typical use of the backwards 
induction (BI) argument. However, the results obtained 
by our general correlated analysis differ from the predic- 
tions of the typical BI argument, implying misuse of the 
BI optimizing technique in application to the IPD anal- 
ysis. In this section we clarify the confusion introduced 
by the typical BI argument in the IPD analysis. 

As noted, BI is based on dynamic programming tech- 
niques, and like any variational optimization technique, 
it is applied only to uncorrelated independent variables. 
That is, any correlations or constraints must be resolved 
before optimization commences. Necessarily then, BI 
must derive the same solution as the Nash analysis un- 
der the same assumed constraint set. As typically used, 
the BI argument in the IPD is commonly used to justify 
the assumption of independent variables underlying the 
Nash equilibrium solution. As is well known, the ratio- 
nale commences with the last stage of a finite supergame. 
In the typical BI argument, the last stage has a special 
role, and the rest of the argument follows in exactly the 
way by iteration. In particular, the usual claim is that BI 
requires that the last stage of an IPD is to be considered 
an independent stage. It is argued that, although there 
are many reasons for players to cooperate, such as the 
presence of any of long range considerations [20j, repu- 
tation effects |6(j . off-equilibrium pathway signals |6l| . 
or punishments and rewards 59], at the last stage none 
of these are present. Hence, it is claimed, the last stage 
is an independent stage, i.e. a single PD. Once we have 
accepted the last stage as a single PD, then under CKR, 
BI automatically derives the unique Nash path. In this 
sense, BI is a complement of the Nash analysis to estab- 
lish the Nash unique solution of the IPD. This argument 
would succeed with the addition of a proof that these, 
and only these effects permitted correlated play. Unfor- 
tunately, this is not the case. 

The significant difference introduced by allowing ratio- 
nal players to adopt correlated play whenever that turns 
out to be payoff maximizing, is that they will consider the 
IPD as a whole, which they are able to do as unbounded 
rational agents. The correlated analysis in the previous 
sections has shown that the first step of the BI argument 



about the last stage is not necessarily optimal. In addi- 
tion, CKR by itself specifies nothing about whether the 
last stage of an IPD is or is not equivalent to a single 
stage game. However, under CKR, players of unbounded 
rationality must take account of the payoffs available un- 
der correlated play. In general then, the typical usage of a 
BI argument to justify a Nash pathway as being optimal 
is not correct under CKR and correlated strategies are 
required for an optimal solution. For the typical BI ar- 
gument to justify the Nash pathway as uniquely optimal, 
it is necessary that the game analysis, and hence CKR, 
is restricted to a particular kind of correlation, namely 
the assumption of independent variables. Even though 
there are apparently no overt motivations for players to 
cooperate in the last stage, it is not optimal and hence 
not rational for players to even consider "what should we 
do if we are at the last stage?" . 



VI. CONCLUSION 

This paper defines novel constrained equilibria in the 
middle ground between current definitions which require 
that supergame strategy choices be either all indepen- 
dent or all fully specified by strategy functions. We em- 
ploy standard conditioned history expansions of the joint 
correlated probability distributions describing expected 
payoff functions which specify only some (or no) strat- 
egy choice variables (a dependent set) in terms of other 
variables (an independent set). We then apply standard 
optimization procedures such as Nash equilibria proce- 
dures or backwards induction to the composite payoff 
functions defined over the remaining independent vari- 
ables to locate novel constrained equilibria. The meth- 
ods developed in this paper ensure that there is no con- 
flict between game theoretic optimization techniques and 
more general variational optimization procedures. Wc 
derive novel constrained equilibria in the finite iterated 
prisoner's dilemma showing in particular, that backwards 
induction establishes that it can be payoff maximizing to 
cooperate in the finite IPD including in the last stage of 
this game. These results contrast with existing claims 
that payoff maximization requires defection in this last 
stage, but these results all depend on the a priori assump- 
tion that choice variables are uncorrelated. We conclude 
by discussing the validity of typical arguments proving 
that ALL DEFECT is the privileged optimal pathway in 
the IPD. 

This paper derived novel constrained equilibria using 
general Markovian-like strategy functions, and a broader 
analysis of the many possible unknown contingent strate- 
gies requires a functional representation for algorithms. 
A mathematical functional analysis would allow us to 
consider multiple algorithms using a functional metric 
to measure the ability of different algorithms to deter- 
mine supergame outcomes. However, the utility of such 
a functional analysis remains an open question and will 
be dealt with in later work. 
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Game theory was originally proposed to provide a sim- 
ple analytic environment for economic and social inter- 
actions and the analysis of this paper has broader ap- 
plication to this wider sphere. For instance, constrained 
equilibria eliminates the first-mover advantage in iterated 
bargaining games and explains experimental observations 
of more equitable play. In such applications however, the 
lack of a systematic way to select a particular equilib- 
rium among others becomes a more serious and impor- 
tant problem. This issue and the broader application to 
economics, social games and evolutionary games will be 



addressed in later work. 
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APPENDIX A: CONSTRAINED EQUILIBRIA IN THE IPD 



Suppose player P x adopts an MKV-M strategy and player P y adopts an MKV-jl strategy function leaving inde- 
pendent variables (x%, XN-k+i, ■ ■ ■ , xn) and (yi, yN-j+i, ■ ■ ■ , Vn), where we also assume N > 3. 

For 1 < k < (N — 1) and j — 0, the independent variables are (x%, XN-k+i, ■ ■ ■ , xn) and y%, and payoffs are 



n, 




?/i - J2n=N-k+i x n + xn, {N - k) even 



Vl ~ En=iV-fc+l x « + x *> (N-k) odd. 



27V + + *±£=%i - En=N- k+ i x n - ^n, (N - k) even 

n y = { (Al) 

^T^Vl - En=N-k+l X n - 2X N , (N - k) Odd. 

For 1 < k < (N — 1) and j — (N — 1), the independent variables are (x%, XN-k+l, ■ ■ ■ > %n) and (z/i , ■ • ■ , Vn), and 
payoffs are 

N N-k-1 N 

U x = 2N + X!+ ^ X n - X! 2/11 ~~ 2 X! y "' 
n=N-k+l n=l n=N-k 

N N-k-1 N 

n=N-k+l n = l n=N-k 

For 1 < k — j < (N — 1), the independent variables are (x%, XN-k+l, •••■> x n) and (y±, yN-k+i, ■ ■ ■ , Vn), and payoffs 



n, 



2N + 3+k - N X! 



2 

2AT + h^L Xl + h^L yi _ 2 J2n=N-k + l x n + Y.Lw-k+1 V*> 
U y = { (A3) 

2N + t=^K Xl + ^L yi - 2 Et N -k +1 x n + 

For 1 < k < (N — 1) and 1 < j < (N — 1) with k > j, the independent variables are (xi, XN- k +i, ■ ■ ■ ,Xn) and 
(yi, y N -j+i, ■ ■ ■ , Vn), and payoffs are 



(N- 


k) even 


(N- 


k) odd 


(N- 


k) even 


(N- 


jfe) odd. 



2N + h=p Xl + ±^ Vl - En=f-l+i x n + ELn-j xn - 2 En=N- 1+ i Vn, (N - k) even 



n, 



2N + h=^L Xl + h^N yi _ Xn + Zn=N- 3 Xn -2^+1 (N - k) O 



dd 



2N + *=p Xl + 2 -±^yi - EtXl+i xn - 2 En=N-, x n + ELn-j+i {N ~ k) even 

B, = { (A4) 

2N + h=^N Xl + 3±|dV yi _ EtXUr Xn - 2 ZLn-J Xn + En=N- 3 + l Vn, (N - k) odd. 
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